169 research outputs found

    Risk, Unexpected Uncertainty, and Estimation Uncertainty: Bayesian Learning in Unstable Settings

    Get PDF
    Recently, evidence has emerged that humans approach learning using Bayesian updating rather than (model-free) reinforcement algorithms in a six-arm restless bandit problem. Here, we investigate what this implies for human appreciation of uncertainty. In our task, a Bayesian learner distinguishes three equally salient levels of uncertainty. First, the Bayesian perceives irreducible uncertainty or risk: even knowing the payoff probabilities of a given arm, the outcome remains uncertain. Second, there is (parameter) estimation uncertainty or ambiguity: payoff probabilities are unknown and need to be estimated. Third, the outcome probabilities of the arms change: the sudden jumps are referred to as unexpected uncertainty. We document how the three levels of uncertainty evolved during the course of our experiment and how it affected the learning rate. We then zoom in on estimation uncertainty, which has been suggested to be a driving force in exploration, in spite of evidence of widespread aversion to ambiguity. Our data corroborate the latter. We discuss neural evidence that foreshadowed the ability of humans to distinguish between the three levels of uncertainty. Finally, we investigate the boundaries of human capacity to implement Bayesian learning. We repeat the experiment with different instructions, reflecting varying levels of structural uncertainty. Under this fourth notion of uncertainty, choices were no better explained by Bayesian updating than by (model-free) reinforcement learning. Exit questionnaires revealed that participants remained unaware of the presence of unexpected uncertainty and failed to acquire the right model with which to implement Bayesian updating

    Under pressure: Response urgency modulates striatal and insula activity during decision-making under risk

    Get PDF
    When deciding whether to bet in situations that involve potential monetary loss or gain (mixed gambles), a subjective sense of pressure can influence the evaluation of the expected utility associated with each choice option. Here, we explored how gambling decisions, their psychophysiological and neural counterparts are modulated by an induced sense of urgency to respond. Urgency influenced decision times and evoked heart rate responses, interacting with the expected value of each gamble. Using functional MRI, we observed that this interaction was associated with changes in the activity of the striatum, a critical region for both reward and choice selection, and within the insula, a region implicated as the substrate of affective feelings arising from interoceptive signals which influence motivational behavior. Our findings bridge current psychophysiological and neurobiological models of value representation and action-programming, identifying the striatum and insular cortex as the key substrates of decision-making under risk and urgency

    Optogenetic Mimicry of the Transient Activation of Dopamine Neurons by Natural Reward Is Sufficient for Operant Reinforcement

    Get PDF
    Activation of dopamine receptors in forebrain regions, for minutes or longer, is known to be sufficient for positive reinforcement of stimuli and actions. However, the firing rate of dopamine neurons is increased for only about 200 milliseconds following natural reward events that are better than expected, a response which has been described as a “reward prediction error” (RPE). Although RPE drives reinforcement learning (RL) in computational models, it has not been possible to directly test whether the transient dopamine signal actually drives RL. Here we have performed optical stimulation of genetically targeted ventral tegmental area (VTA) dopamine neurons expressing Channelrhodopsin-2 (ChR2) in mice. We mimicked the transient activation of dopamine neurons that occurs in response to natural reward by applying a light pulse of 200 ms in VTA. When a single light pulse followed each self-initiated nose poke, it was sufficient in itself to cause operant reinforcement. Furthermore, when optical stimulation was delivered in separate sessions according to a predetermined pattern, it increased locomotion and contralateral rotations, behaviors that are known to result from activation of dopamine neurons. All three of the optically induced operant and locomotor behaviors were tightly correlated with the number of VTA dopamine neurons that expressed ChR2, providing additional evidence that the behavioral responses were caused by activation of dopamine neurons. These results provide strong evidence that the transient activation of dopamine neurons provides a functional reward signal that drives learning, in support of RL theories of dopamine function

    Altered Neural and Behavioral Dynamics in Huntington's Disease: An Entropy Conservation Approach

    Get PDF
    Background: Huntington’s disease (HD) is an inherited condition that results in neurodegeneration of the striatum, the forebrain structure that processes cortical information for behavioral output. In the R6/2 transgenic mouse model of HD, striatal neurons exhibit aberrant firing patterns that are coupled with reduced flexibility in the motor system. The aim of this study was to test the patterns of unpredictability in brain and behavior in wild-type (WT) and R6/2 mice. Methodology/Principal Findings: Striatal local field potentials (LFP) were recorded from 18 WT and 17 R6/2 mice (aged 8– 11 weeks) while the mice were exploring a plus-shaped maze. We targeted LFP activity for up to 2 s before and 2 s after each choice-point entry. Approximate Entropy (ApEn) was calculated for LFPs and Shannon Entropy was used to measure the probability of arm choice, as well as the likelihood of making consecutive 90-degree turns in the maze. We found that although the total number of choice-point crossings and entropy of arm-choice probability was similar in both groups, R6/2 mice had more predictable behavioral responses (i.e., were less likely to make 90-degree turns and perform them in alternation with running straight down the same arm), while exhibiting more unpredictable striatal activity, as indicated by higher ApEn values. In both WT and R6/2 mice, however, behavioral unpredictability was negatively correlated with LFP ApEn. Conclusions/Significance: HD results in a perseverative exploration of the environment, occurring in concert with mor

    Prolonged dopamine signalling in striatum signals proximity and value of distant rewards

    Get PDF
    Predictions about future rewarding events have a powerful influence on behaviour. The phasic spike activity of dopamine-containing neurons, and corresponding dopamine transients in the striatum, are thought to underlie these predictions, encoding positive and negative reward prediction errors. However, many behaviours are directed towards distant goals, for which transient signals may fail to provide sustained drive. Here we report an extended mode of reward-predictive dopamine signalling in the striatum that emerged as rats moved towards distant goals. These dopamine signals, which were detected with fast-scan cyclic voltammetry (FSCV), gradually increased or—in rare instances—decreased as the animals navigated mazes to reach remote rewards, rather than having phasic or steady tonic profiles. These dopamine increases (ramps) scaled flexibly with both the distance and size of the rewards. During learning, these dopamine signals showed spatial preferences for goals in different locations and readily changed in magnitude to reflect changing values of the distant rewards. Such prolonged dopamine signalling could provide sustained motivational drive, a control mechanism that may be important for normal behaviour and that can be impaired in a range of neurologic and neuropsychiatric disorders.National Institutes of Health (U.S.) (Grant R01 MH060379)National Parkinson Foundation (U.S.)Cure Huntington’s Disease Initiative, Inc. (Grant A-5552)Stanley H. and Sheila G. Sydney Fun

    Two spatiotemporally distinct value systems shape reward-based learning in the human brain

    Get PDF
    Avoiding repeated mistakes and learning to reinforce rewarding decisions is critical for human survival and adaptive actions. Yet, the neural underpinnings of the value systems that encode different decision-outcomes remain elusive. Here coupling single-trial electroencephalography with simultaneously acquired functional magnetic resonance imaging, we uncover the spatiotemporal dynamics of two separate but interacting value systems encoding decision-outcomes. Consistent with a role in regulating alertness and switching behaviours, an early system is activated only by negative outcomes and engages arousal-related and motor-preparatory brain structures. Consistent with a role in reward-based learning, a later system differentially suppresses or activates regions of the human reward network in response to negative and positive outcomes, respectively. Following negative outcomes, the early system interacts and downregulates the late system, through a thalamic interaction with the ventral striatum. Critically, the strength of this coupling predicts participants’ switching behaviour and avoidance learning, directly implicating the thalamostriatal pathway in reward-based learning

    Subjective utility moderates bidirectional effects of conflicting motivations on pain perception

    Get PDF
    Minimizing pain and maximizing pleasure are conflicting motivations when pain and reward co-occur. Decisions to prioritize reward consumption or pain avoidance are assumed to lead to pain inhibition or facilitation, respectively. Such decisions are a function of the subjective utility of the stimuli involved, i.e. the relative value assigned to the stimuli to compare the potential outcomes of a decision. To test perceptual pain modulation by varying degrees of motivational conflicts and the role of subjective utility, we implemented a task in which healthy volunteers had to decide between accepting a reward at the cost of receiving a nociceptive electrocutaneous stimulus or rejecting both. Subjective utility of the stimuli was assessed by a matching task between the stimuli. Accepting reward coupled to a nociceptive stimulus resulted in decreased perceived intensity, while rejecting the reward to avoid pain resulted in increased perceived intensity, but in both cases only if a high motivational conflict was present. Subjective utility of the stimuli involved moderated these bidirectional perceptual effects: the more a person valued money over pain, the more perceived intensity increased or decreased. These findings demonstrate pain modulation when pain and reward are simultaneously present and highlight the importance of subjective utility for such modulation

    From uncertainty to reward: BOLD characteristics differentiate signaling pathways

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Reward value and uncertainty are represented by dopamine neurons in monkeys by distinct phasic and tonic firing rates. Knowledge about the underlying differential dopaminergic pathways is crucial for a better understanding of dopamine-related processes. Using functional magnetic resonance blood-oxygen level dependent (BOLD) imaging we analyzed brain activation in 15 healthy, male subjects performing a gambling task, upon expectation of potential monetary rewards at different reward values and levels of uncertainty.</p> <p>Results</p> <p>Consistent with previous studies, ventral striatal activation was related to both reward magnitudes and values. Activation in medial and lateral orbitofrontal brain areas was best predicted by reward uncertainty. Moreover, late BOLD responses relative to trial onset were due to expectation of different reward values and likely to represent phasic dopaminergic signaling. Early BOLD responses were due to different levels of reward uncertainty and likely to represent tonic dopaminergic signals.</p> <p>Conclusions</p> <p>We conclude that differential dopaminergic signaling as revealed in animal studies is not only represented locally by involvement of distinct brain regions but also by distinct BOLD signal characteristics.</p

    Spatiotemporal neural characterization of prediction error valence and surprise during reward learning in humans

    Get PDF
    Reward learning depends on accurate reward associations with potential choices. These associations can be attained with reinforcement learning mechanisms using a reward prediction error (RPE) signal (the difference between actual and expected rewards) for updating future reward expectations. Despite an extensive body of literature on the influence of RPE on learning, little has been done to investigate the potentially separate contributions of RPE valence (positive or negative) and surprise (absolute degree of deviation from expectations). Here, we coupled single-trial electroencephalography with simultaneously acquired fMRI, during a probabilistic reversal-learning task, to offer evidence of temporally overlapping but largely distinct spatial representations of RPE valence and surprise. Electrophysiological variability in RPE valence correlated with activity in regions of the human reward network promoting approach or avoidance learning. Electrophysiological variability in RPE surprise correlated primarily with activity in regions of the human attentional network controlling the speed of learning. Crucially, despite the largely separate spatial extend of these representations our EEG-informed fMRI approach uniquely revealed a linear superposition of the two RPE components in a smaller network encompassing visuo mnemonic and reward areas. Activity in this network was further predictive of stimulus value updating indicating a comparable contribution of both signals to reward learning

    Temporal-Difference Reinforcement Learning with Distributed Representations

    Get PDF
    Temporal-difference (TD) algorithms have been proposed as models of reinforcement learning (RL). We examine two issues of distributed representation in these TD algorithms: distributed representations of belief and distributed discounting factors. Distributed representation of belief allows the believed state of the world to distribute across sets of equivalent states. Distributed exponential discounting factors produce hyperbolic discounting in the behavior of the agent itself. We examine these issues in the context of a TD RL model in which state-belief is distributed over a set of exponentially-discounting “micro-Agents”, each of which has a separate discounting factor (γ). Each µAgent maintains an independent hypothesis about the state of the world, and a separate value-estimate of taking actions within that hypothesized state. The overall agent thus instantiates a flexible representation of an evolving world-state. As with other TD models, the value-error (δ) signal within the model matches dopamine signals recorded from animals in standard conditioning reward-paradigms. The distributed representation of belief provides an explanation for the decrease in dopamine at the conditioned stimulus seen in overtrained animals, for the differences between trace and delay conditioning, and for transient bursts of dopamine seen at movement initiation. Because each µAgent also includes its own exponential discounting factor, the overall agent shows hyperbolic discounting, consistent with behavioral experiments
    corecore